-
Notifications
You must be signed in to change notification settings - Fork 1.5k
PARQUET-660: Ignore extension fields in protobuf messages. #351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| // they have to be ignored. | ||
| continue; | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option would be throw an Exception here and explicitly do not support messages with extensions.
I could also log a warning here, somehow only once though, to avoid the noise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the approach of throwing an exception. It is easy to add support for a special case later. It is harder to change behavior as it breaks existing applications.
For example in Avro we explicitly throw in the case of recursive schemas.
|
@lukasnalezenec what do you think? |
|
Exception is good idea but I am not sure if it is acceptable change of behaviour. |
Extension fields can have overlapping field indexes with base fields. Writing extension fields can result in unexpected data corruption or an error.
|
@lukasnalezenec I adjusted the PR to throw an exception in 27580ab. I also think this is a better solution, after some consideration. The PR doesn't introduce any change in behaviour. An informative exception is thrown, instead of a confusing one which would follow otherwise. |
|
|
||
| fieldWriters[i] = writer; | ||
| i++; | ||
| fieldWriters[fieldDescriptor.getIndex()] = writer; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this the same change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, fields in the list over which we're iterating are ordered by index. Here's how the list is built:
https://github.com/google/protobuf/blob/master/java/core/src/main/java/com/google/protobuf/Descriptors.java#L826
|
this looks good to me. |
Currently, converting protobuf messages with extension can result in an uninformative error or a data corruption. A more detailed explanation in the corresponding [jira](https://issues.apache.org/jira/browse/PARQUET-660). This patch simply ignores extension fields in protobuf messages. In the longer run, I'd like to add a proper support for Protobuf extensions. This might take a little longer though, so I've decided to improve the current situation with this patch. Author: Jakub Kukul <jakub.kukul@gmail.com> Closes apache#351 from jkukul/master and squashes the following commits: 27580ab [Jakub Kukul] PARQUET-660: Throw Unsupported exception for messages with extensions. db6e08b [Jakub Kukul] PARQUET-660: Refactor: Don't use additional variable for indexing fieldWriters. e910a8a [Jakub Kukul] PARQUET-660: Refactor: Add missing indentation.
Currently, converting protobuf messages with extension can result in an uninformative error or a data corruption. A more detailed explanation in the corresponding [jira](https://issues.apache.org/jira/browse/PARQUET-660). This patch simply ignores extension fields in protobuf messages. In the longer run, I'd like to add a proper support for Protobuf extensions. This might take a little longer though, so I've decided to improve the current situation with this patch. Author: Jakub Kukul <jakub.kukul@gmail.com> Closes apache#351 from jkukul/master and squashes the following commits: 27580ab [Jakub Kukul] PARQUET-660: Throw Unsupported exception for messages with extensions. db6e08b [Jakub Kukul] PARQUET-660: Refactor: Don't use additional variable for indexing fieldWriters. e910a8a [Jakub Kukul] PARQUET-660: Refactor: Add missing indentation.
Currently, converting protobuf messages with extension can result in an uninformative error or a data corruption. A more detailed explanation in the corresponding [jira](https://issues.apache.org/jira/browse/PARQUET-660). This patch simply ignores extension fields in protobuf messages. In the longer run, I'd like to add a proper support for Protobuf extensions. This might take a little longer though, so I've decided to improve the current situation with this patch. Author: Jakub Kukul <jakub.kukul@gmail.com> Closes apache#351 from jkukul/master and squashes the following commits: 27580ab [Jakub Kukul] PARQUET-660: Throw Unsupported exception for messages with extensions. db6e08b [Jakub Kukul] PARQUET-660: Refactor: Don't use additional variable for indexing fieldWriters. e910a8a [Jakub Kukul] PARQUET-660: Refactor: Add missing indentation.
|
Is there any movement on this item and get support of serialising protobuffer extensions. |
Currently, converting protobuf messages with extension can result in an uninformative error or a data corruption. A more detailed explanation in the corresponding jira.
This patch simply ignores extension fields in protobuf messages.
In the longer run, I'd like to add a proper support for Protobuf extensions. This might take a little longer though, so I've decided to improve the current situation with this patch.